Gas Data
The gas data represents information about various gas stations,
including their location, services offered, population of compromised
individuals (POC), and other relevant details. Here’s an explanation of
the columns in the dataset:
-
X: Row identifier.
-
site_row_id: Identifier for the site.
-
STATE: State where the gas station is located.
-
county: County where the gas station is located.
-
ADDRESS: Street address of the gas station.
-
CITY: City where the gas station is located.
-
ycoord: Latitude coordinate of the gas station.
-
xcoord: Longitude coordinate of the gas station.
-
SITE_DESCRIPTION: Description of the gas station site.
-
service_or_fuel: Indicates whether the station provides service, fuel,
or both.
-
diesel: Indicates if diesel fuel is available at the station.
-
twentyfour_hour_flag: Indicates if the station operates 24 hours.
-
car_wash: Indicates if the station has a car wash service.
-
truckstop_flag: Indicates if the station is a truck stop.
-
description: Additional description of the gas station.
-
PUMP_TECH: Pump technology used at the gas station.
-
POC: Population of compromised individuals.
-
HIFCA: High Intensity Financial Crime Area.
-
ZIPnew: ZIP code of the gas station.
-
POCAGE: Age distribution of the population of compromised individuals.
-
POCGAP: Age gap distribution of the population of compromised
individuals.
-
ZIPPOC: ZIP code of the population of compromised individuals.
-
HFG: Human Factors Geometry.
-
MSA: Metropolitan Statistical Area.
-
dist.to.poc: Distance to the population of compromised individuals.
-
cate.poc.density: Categorized population of compromised individuals
density.
-
cate.poc.age: Categorized population of compromised individuals age.
-
cate.poc.age.20: Categorized population of compromised individuals age
group 20.
-
cate.poc.intensity: Categorized population of compromised individuals
intensity.
-
cate.poc.intensity.tot: Total categorized population of compromised
individuals intensity.
-
MSA_POC: Metropolitan Statistical Area population of compromised
individuals.
-
MSA_POC.1: Another column indicating Metropolitan Statistical Area
population of compromised individuals.
This dataset contains detailed information about gas stations and the
population they serve, including geographical coordinates, services
offered, and demographic characteristics of the surrounding
population.
gas <- read.csv("https://ecoleman451.github.io/website/Data%20Visualization/Datasets/POC.csv")
Simple Leaflet Map
In the map below, each point represents the precise location of a gas
station, with latitude and longitude coordinates derived from the
“xcoord” and “ycoord” columns in the gas dataset, respectively. Hovering
over each point reveals essential details such as the state, county,
address, and ZIP code associated with that particular gas station.
Explore the interactive map to visualize the distribution of gas
stations across different regions:
gas_samp <- gas %>% sample_n(500)
# Create a leaflet map
gas_map <- leaflet(data = gas_samp) %>%
addTiles() %>%
addMarkers(
lng = ~xcoord,
lat = ~ycoord,
popup = ~paste("State: ", STATE, "<br>",
"County: ", county, "<br>",
"Address: ", ADDRESS, "<br>",
"Zip Code: ", ZIPnew)
)
# Display the map
gas_map
Leaflet Map
Below, we enhance the leaflet map by specifying the radius and color
of the markers. Each marker now appears as a circle with a fixed radius
of 5 and a color set to blue. Similar to the previous map, each circle
represents a gas station, and hovering over a point reveals essential
information such as the state, county, address, and ZIP code associated
with that specific gas station. Explore the interactive map to visualize
the distribution of gas stations with this enhanced visual
representation.
# Create a leaflet map
gas_map2 <- leaflet(data = gas_samp) %>%
addTiles() %>%
setView(lng = mean(gas_samp$xcoord),
lat = mean(gas_samp$ycoord),
zoom = 13) %>%
addProviderTiles("Esri.WorldGrayCanvas") %>%
addCircleMarkers(
~xcoord,
~ycoord,
color = "blue", # Adjust color as needed
radius = 5, # Adjust radius as needed
stroke = FALSE,
fillOpacity = 0.4,
label = ~paste("State: ", STATE,
"County: ", county,
"Address: ", ADDRESS,
"Zip Code: ", ZIPnew)
) %>%
addLegend(position = "bottomright",
colors = "blue", # Adjust color as needed
labels = "Gas Station",
title = "Gas Stations",
opacity = 0.4)
# Display the map
gas_map2
Best Map
In this iteration, we introduce a more sophisticated leaflet map. The
radius of each point on the map is determined by the number of Points of
Compromise (POCs) in the gas station’s ZIP code. Therefore, larger
circles represent ZIP codes with more POCs, providing a visual indicator
of potential risk areas.
Additionally, the color of each point corresponds to the type of
services offered by the gas station: “Fuel”, “Service Only”, or “Both”.
However, since the dataset does not include any gas stations that offer
“Service Only”, only the categories “Fuel” and “Both” will be displayed
on the map.
As with the previous maps, hovering over a point reveals detailed
information such as the state, county, address, and ZIP code associated
with the respective gas station. Explore the map to gain insights into
the distribution of gas stations and their associated services.
# Create a color palette based on service_or_fuel values
service_palette <- colorFactor(palette = "Set1", domain = gas_samp$service_or_fuel)
# Create the leaflet map
gas_map3 <- leaflet(data = gas_samp) %>%
addTiles() %>%
addProviderTiles("Esri.WorldGrayCanvas") %>%
addCircleMarkers(
~xcoord,
~ycoord,
color = ~service_palette(service_or_fuel), # Use colorFactor
radius = gas_samp$ZIPPOC * 10, # Adjust radius as needed
stroke = FALSE,
fillOpacity = 0.4,
label = ~paste("State: ", STATE, "<br>",
"County: ", county, "<br>",
"Address: ", ADDRESS, "<br>",
"Zip Code: ", ZIPnew)
) %>%
addLegend(position = "bottomright",
colors = service_palette(unique(gas_samp$service_or_fuel)), # Use unique service_or_fuel values
labels = unique(gas_samp$service_or_fuel),
title = "Gas Stations",
opacity = 0.4)
# Display the map
gas_map3
Philly Crime Data
The Philadelphia crime dataset contains information on various
incidents, including details such as demographic characteristics,
incident severity, location, and other relevant attributes. Here’s an
explanation of the dataset columns:
-
dc_key: A unique identifier for each incident.
-
race: Specifies the racial background of the individuals involved,
categorized as Black (Non-Hispanic), Hispanic (Black or White), and so
on.
-
sex: Indicates the gender of the individuals involved, classified as
Male or Female.
-
fatal: Indicates whether the incident resulted in a fatality (Fatal) or
not (Nonfatal).
-
date: Records the date and time when the incident occurred.
-
has_court_case: Specifies whether the incident is associated with a
court case (Yes/No).
-
age: Represents the age of the individuals involved in the incident.
-
street_name: Denotes the name of the street where the incident took
place.
-
block_number: Indicates the block number related to the incident’s
location.
-
zip_code: Provides the ZIP code of the incident location.
-
council_district: Identifies the council district corresponding to the
incident location.
-
police_district: Identifies the police district corresponding to the
incident location.
-
neighborhood: Specifies the neighborhood where the incident occurred.
-
house_district: Identifies the house district associated with the
incident location.
-
senate_district: Identifies the senate district associated with the
incident location.
-
school_catchment: Specifies the school catchment area associated with
the incident location.
-
lng: Represents the longitude coordinate of the incident location.
-
lat: Represents the latitude coordinate of the incident location.
This dataset provides valuable insights into the demographics of
individuals involved in various incidents, the nature and severity of
the incidents, and their spatial distribution across different
neighborhoods and districts within Philadelphia. Analyzing this data can
help identify patterns, trends, and areas of concern related to crime
and public safety in the city. We’re narrowing down our dataset to focus
solely on the data from 2023. Since there’s no specific variable
denoting the year, we’ll derive it from the existing ‘date’ variable.
After creating the ‘Year’ variable, we can then filter the data to
include only observations from 2023. Consequently, our updated dataset
now comprises 1666 observations and 19 variables, including the newly
added ‘Year’.
philly <- read.csv("https://ecoleman451.github.io/website/Data%20Visualization/Datasets/PhillyCrimeSince2015.csv")
# Convert date variable to date format
philly$date <- as.Date(philly$date, format = "%m/%d/%Y %H:%M")
# Extract year from date variable
philly$year <- format(philly$date, "%Y")
philly <- subset(philly, year=="2023")
Leaflet Map
Now, let’s visualize fatal versus non-fatal crimes that occurred in
Philadelphia in the year 2023 on a leaflet map. We’ll once again utilize
the “color” function to differentiate between the two types of crimes.
Each category, “Fatal” or “Nonfatal,” will be assigned a distinct color,
providing a visual representation of the crime type. The map follows a
similar format to the ones above, with each circle point denoting a
specific crime incident. Hovering over a point will reveal details such
as the “Neighborhood,” “Date,” “Race,” “Sex,” “Age,” and “Street”
associated with that particular crime. Upon visual inspection of the
map, it appears that there is a notable disparity between the number of
non-fatal crimes and fatal crimes. However, to confirm this observation,
further analysis would be necessary.
library(leaflet)
library(dplyr)
# Create color palette for fatal and non-fatal crimes
fatal <- "red"
non_fatal <- "blue"
# Create leaflet map
map <- leaflet(philly) %>%
addTiles() %>%
addCircleMarkers(
~lng, ~lat,
color = ifelse(philly$fatal == "Fatal", fatal, non_fatal),
radius = 5,
label = ~paste("Neighborhood: ", neighborhood,
"Date: ", date,
"Race: ", race,
"Sex: ", sex,
"Age: ", age,
"Street: ", street_name),
labelOptions = labelOptions(
direction = "auto"
)
) %>%
addLegend(
position = "bottomright",
colors = c(fatal, non_fatal),
labels = c("Fatal", "Non-Fatal"),
title = "Crime Type"
) %>%
addScaleBar() %>%
addControl(
html = "<h4>Philadelphia Crime Locations (2015-2024)</h4>",
position = "topright"
)
# Display the map
map
Better Leaflet Map
Now, let’s create an enhanced leaflet map to visualize fatal versus
non-fatal crimes that occurred in Philadelphia. We’ll utilize the
“color” function once again, with colors representing whether a crime
was labeled as “Fatal” or “Nonfatal”. Each category will be uniquely
colored, offering clear visual identification of the crime type. We’ll
represent each crime location with a circle marker on the map. Hovering
over a point will display detailed information including “Object ID”,
“Year”, “Race”, “Sex”, “Age”, “Wound”, and “Location” for each crime
incident.
# Load required libraries
library(leaflet)
library(sf)
# Suppress messages while reading GeoJSON files
options(warn=-1)
# Read the data without printing messages
philly <- st_read("https://pengdsci.github.io/STA553VIZ/w08/PhillyShootings.geojson", quiet = TRUE)
phillyNeighbor <- st_read("https://pengdsci.github.io/STA553VIZ/w08/Neighborhoods_Philadelphia.geojson", quiet = TRUE)
# Reset warning settings
options(warn=0)
# Convert 'philly' data to sf object
philly_sf <- st_as_sf(philly, coords = c("point_x", "point_y"), crs = 4326)
# Define color palette for fatal and non-fatal crimes
fatal_color <- "red"
non_fatal_color <- "gold"
# Create leaflet map
map <- leaflet() %>%
addProviderTiles(providers$Esri.WorldGrayCanvas) %>%
addPolygons(data = phillyNeighbor,
color = 'skyblue',
weight = 1) %>%
addCircleMarkers(data = philly_sf,
~point_x, ~point_y,
color = ifelse(philly$fatal == 1, fatal_color, non_fatal_color),
radius = 5,
popup = ~paste("Object ID: ", objectid,
"<br>Year: ", year,
"<br>Race: ", race,
"<br>Sex: ", sex,
"<br>Age: ", age,
"<br>Wound: ", wound,
"<br>Location: ", location),
labelOptions = labelOptions(
direction = "auto"
)
) %>%
addLegend(
position = "bottomright",
colors = c("red", "gold"),
labels = c("Fatal", "Non-Fatal"),
title = "Crime Type"
) %>%
addScaleBar() %>%
addControl(
html = "<h4>Philadelphia Crime Locations (2015-2024)</h4>",
position = "topright"
) %>%
addProviderTiles(providers$Esri.WorldGrayCanvas) %>%
setView(lng = -75.1527, lat = 39.9707, zoom = 11)
# Display the map
map
U.S. Presidential Election Data (2000-2024)
Our initial dataset, named “election”, encompasses Presidential
election outcomes spanning the years 2000, 2004, 2008, 2012, 2016, and
2020. With 72,617 observations and 12 variables, it provides
comprehensive insights into each state’s and county’s election results,
detailing the winning candidate in each county, along with the total
votes received by each candidate.
Prior to analysis, some data cleaning was imperative, particularly
concerning the county FIPS codes—a unique 5-digit identifier assigned to
every county in the United States. Initially, certain codes erroneously
contained only 4 digits, notably when a “0” preceded the first digit.
For instance, Autauga County, Alabama’s FIPS code “01001” was recorded
as “1001” in the dataset. This discrepancy was rectified using the
“TEXT” function in Excel, applied before importing the data into the
“election” set.
Utilizing the “election” dataset, our objective is to split the data
into county-level and state-level subsets. Both subsets include a new
variable named “party_percentage,” calculated to ascertain the
percentage of voters favoring the winning party within their respective
state or county. The “county_data” subset provides election results
categorized by county, while the “state_data” subset presents election
outcomes aggregated by state. Furthermore, both subsets retain solely
the winning party’s information for analysis.
# Load the required library
library(dplyr)
# Read the data
election <- read.csv("https://ecoleman451.github.io/website/Data%20Visualization/Datasets/PresidentialElection2000To2020.csv")
# County-level Data
county_data <- election %>%
group_by(year, state, county_name) %>%
mutate(party_percentage = candidatevotes / sum(candidatevotes) * 100) %>%
filter(party_percentage == max(party_percentage)) %>%
select(year, state, county_fips, party, candidate, candidatevotes, party_percentage)
# State-level Data
state_data <- election %>%
group_by(year, state) %>%
mutate(party_percentage = candidatevotes / sum(candidatevotes) * 100) %>%
filter(party_percentage == max(party_percentage)) %>%
select(year, state, party, candidate, candidatevotes, party_percentage)
# Save county-level data to a new CSV file
write.csv(county_data, file = "county_level_data.csv", row.names = FALSE)
# Save state-level data to a new CSV file
write.csv(state_data, file = "state_level_data.csv", row.names = FALSE)
Choropleth Map
Now that we’ve split the dataset into “county_data,” focusing solely on
election results (specifically the winning party) at the county level,
we can leverage Tableau, an interactive data visualization tool, to
craft a Choropleth Map. This map will display presidential election
outcomes at the county level. Different colors are assigned to represent
the major political parties (Democrat & Republican), and each
county’s shading reflects the winning political party in a specific
election year. The interactive map includes a filter to alter the
displayed year(s). Additionally, hover text appears when hovering over a
specific county on the map, providing information such as “year,”
“state,” “party,” “candidatevotes,” and “party_percentage” for the
respective county.
